compaq_armada_small_bw.gif (5457 bytes) Alf P. Steinbach´s personal homepage: Programming: Narrow topics

Efficiently updating products over multisets

Contents:

About the notation.
: Multiset operations.
: Product over a multiset.
: Narrowing: non-zero versus zero values.
: An abstract datatype.
: A generalization.
: Multiset product versus numeric product.

In some situations (examples include Dempster-Shafer evidence combination and some kinds of curve interpolation) you need the product of a set of numbers, where this set is incrementally updated. When the set is a multiset, that is, the same number can occur more than once, and the set can contain the number 0, it becomes a problem to incrementally update the product along with the set. The reason is simple: it´s not meaningful to divide by 0 (corresponding to a removal of 0 from the set).

Happily the solution is as simple as the problem: just keep a count of the number of zeroes in the set, along with the product of the non-zero elements. From this information the product over the set can be found in constant time. However, I´ve not seen this solution described or implemented anywhere, which goes to prove that simplicity is simply hard to achieve.

I first encountered the problem, and devised the solution described here, while working on a term assignment as a 4th year honors student at Heriot Watt University, Edinburgh, late 1986, and then used it (but not in the abstract form presented here) in my B.Sc. dissertation. Since I didn´t follow procedures (nobody knew what I did until a few days before the exams), and almost flunked out of the university, I don´t know whether that dissertation is available anywhere, but at least it was submitted in three or four copies, and I received my degree. I heard they had to change the rules after my unorthodox stunt...

Last updated Wednesday, February 03, 1999

Efficiently updating products over multisets

About the notation.

Red bold font indicate a definition. The notation "<>" is the "not equal" operator. The symbol "·" is multiplication.

Multiset operations.

A multiset is a set which can contain more than one instance of each element in the set. The number of occurrences of an element a in a set s is here called the member count of a in s. Usually the member count is a natural number, that is, a non-negative integer; here it can be any real number. The notation employed here is non-standard, as is the concept of a multiset allowing real-numbered member counts.

	Operation:	Description & definition:
	#( a, s )	Member count of a in s. Defining operation.
	Ø	The empty set. For all a: #( a, Ø ) = 0.
	{.a.}	Singleton set. #( a, {.b.} ) = (if a = b then 1 else 0).
	n·s	Member count scaling. #( a, n·s ) = n·#( a, s ).
	s1 (+) s2	Multiset union. #( a, s1 (+) s2 ) = #( a, s1 ) + #( a, s2 ).
	s1 (-) s2	Multiset difference. #( a, s1 (-) s2 ) = #( a, s1 ) - #( a, s2 ).

Conceptually a number a is an element (or member) of a set s iff the member count of e in s is non-zero.

Product over a multiset.

Let P stand for the usual product-of operator (uppercase pi, which is difficult to present here since Microsoft FrontPage, my HTML editor, doesn´t support HTML math extensions). Then, using the convention that 0⁰ = 1, the product P( s ) over a multiset s can be defined as

P( s ) = P_a(a^{#(
a, s )})

P( s ) is undefined when #( 0, s ) < 0, since the product then involves a negative power of 0.

By the definition above, P( s ) <> 0 when #( 0, s ) = 0; in particular, P( Ø ) = 1.

By the definition above, P( s ) = 0 when #( 0, s ) > 0.

Note that P distributes over multiset union, as it should:

P( s1 (+) s2 ) = P_a(a^{#(
a, s1 (+) s2 )}) = P_a(a^{#(
a, s1 ) + #( a, s2 )}) = P_a(a^{#(
a, s1 )}·a^{#( a, s2 )}) = P( s1 )·P( s2 )

This is just a restatement of the simple in a less than simple notation, but the property is crucial.

Narrowing: non-zero versus zero values.

In order to efficiently update the product it´s useful to differentiate between the zero and non-zero values. To do so it´s convenient to define a narrowing operation narrow( s, a ), which extracts the subset of a values from s, as

narrow( s, a ) = #( a, s )·{.a.}

The following is obvious, but as for the distributive property of P it´s crucial:

#( a, s ) = #( a, narrow( s, a ) )

The product of non-zero values, Pnz( s ), and the product of zero values, Pz( s ), can be now defined as respectively

Pz( s ) = P( narrow( s, 0 ) )

Pnz( s ) = P( s (-) narrow( s, 0 ) )

and, since P distributes over multiset union, we have

Pnz( s )·Pz( s ) = P( s (-) narrow( s, 0 ) (+) narrow( s, 0 ) ) = P( s )

Incremental updating of P( s ) is now reduced to incremental updating of Pz( s ) and Pnz( s ). And Pz( s ) can be computed in constant time from the count of zeros #( 0, s ) = #( 0, narrow( 0, s ) ):

Pz( s ) = (if #( 0, s ) = 0 then 1 else if #( 0, s ) > 0 then 0).

If #( 0, s ) < 0 then Pz( s ) is undefined.

Thus, only the values Pnz( s ) and #( 0, s ) need to be kept and updated in order to compute the product. The actual multiset need not be represented.

An abstract data type.

To represent the Pnz( s ) and #( 0, s ) values the following abstract data type r (just a representative pair of numbers) is convenient:

r( a, b )_p = a

r( a, b )_z = b

r( s ) = r( Pnz( s ), #( 0, s ) )

Pnz( s ) and Pz( s ), and thus P( s ), are trivially expressed in terms of the set´s representative pair:

Pnz( s ) = r( s )_p

Pz( s ) = (if r( s )_z = 0 then 1 else if r( s )_z > 0 then 0)

P( s ) = Pnz( s )·Pz( s ) = (if r( s )_z = 0 then r( s )_p else if r( s )_z > 0 then 0)

And the representative pair is equally trivially updated when a value is added to or removed from the set:

r( s (+) {.a.} ) = (if a = 0 then r( r( s )_p, r( s )_z + 1 ) else r( a·r( s )_p, r( s )_z ))

r( s (-) {.a.} ) = (if a = 0 then r( r( s )_p, r( s )_z - 1 ) else r( r( s )_p/a, r( s )_z ))

These update operations are all that´s needed, but while efficient (you can´t do better than constant time) it´s not exactly elegant...

A generalization.

The singleton update operations above are special cases of the general set update operations

r( s1 (+) s2 ) = r( r( s1 )_p·r( s2 )_p, r( s1 )_z + r( s2 )_z )

r( s1 (-) s2 ) = r( r( s1 )_p/r( s2 )_p, r( s1 )_z - r( s2 )_z )

These more general but not immediately useful operations suggest defining the following operations on representative pairs:

r1 × r2 = r( r1_p·r2_p, r1_z + r2_z )

r1 ÷ r2 = r( r1_p/r2_p, r1_z - r2_z )

(Note that extreme care must be exercised if these operations are implemented in terms of complex arithmetic operations, since there´s no periodicity here.)

The singleton update operations now become simply

r( s (+) {.a.} ) = r( s ) × r( {.a.} )

r( s (-) {.a.} ) = r( s ) ÷ r( {.a.} )

where, by the earlier definitions,

r( {.a.} ) = r( Pnz( {.a.} ), #( 0, {.a.} ) ) = (if a = 0 then r( 1, 1 ) else r( a, 0 ))

which suggests defining

r( a ) = r( {.a.} ) = (if a = 0 then r( 1, 1 ) else r( a, 0 ))

P( r ) = (if r_z = 0 then r_p else if r_z > 0 then 0)

corresponding to an identification of the singleton set {.a.} with the value a. With this identification and the direct definition of P all calculations can be done in terms of numeric values and representative pairs, where the multiset is only implied.

Multiset product versus numeric product.

Multiset products are isomorphic to ordinary numeric products:

P( r( a ) × r( b ) ) = P( r( {.a.} (+) {.b.} ) ) = a·b

P( r( a ) ÷ r( b ) ) = P( r( {.a.} (-) {.b.} ) ) = a/b when not( a = 0 and b = 0 )

Hence the numeric product laws for commutativity and associativity hold also for multiset representative pairs.

However, r( 0 ) is not a zero of multiset representative pairs, and P( r( 0 ) ÷ r( 0 ) ) = 1 instead of undefined (or in computer programming, a NaN).